Identifying and targeting high-value customer segments to maximize customer lifetime value.
Developing personalized marketing strategies to increase customer engagement and loyalty.
Optimizing marketing spend and resource allocation based on customer segment characteristics.
Improving customer retention and reducing churn rates through targeted retention efforts.
Enhancing overall business profitability by focusing on the most valuable customer segments.
In the highly competitive e-commerce landscape, a one-size-fits-all approach is no longer effective for customer engagement and retention.
Understanding individual customer behavior and preferences is crucial for developing targeted marketing strategies that resonate with each segment.
Personalized communications, product recommendations, and promotions can significantly improve customer satisfaction, loyalty, and lifetime value.
The shift in consumer behavior post-COVID has accelerated the need for businesses to adopt data-driven, customer-centric strategies to remain competitive.
Effective customer segmentation and targeted marketing efforts can lead to higher conversion rates, increased average order value, and improved overall business performance.
The specific business challenge is to leverage customer transaction data to identify and characterize distinct customer segments based on their purchasing behavior, and to develop targeted marketing strategies that maximize customer engagement, retention, and profitability.
Analyze customer transaction data to identify meaningful customer segments using the RFM (Recency, Frequency, Monetary) framework.
Develop a deep understanding of each segment’s characteristics, preferences, and purchasing patterns.
Create personalized marketing strategies tailored to each segment, focusing on cross-selling, upselling, and retention efforts.
Optimize marketing spend and resource allocation based on the value and potential of each customer segment.
Monitor and assess the effectiveness of targeted marketing campaigns, making data-driven decisions to continually improve customer engagement and business performance.
By addressing this specific business challenge, the e-commerce company aims to enhance customer loyalty, increase customer lifetime value, and ultimately drive business growth and profitability in the competitive online marketplace.
Source
Description
Original
Online_Sales.csv
CustomersData.csv
Marketing_Spend.csv
Discount_Coupon.csv
Tax_amount.csv
Transformed Into…
eCommerce.csv
N = 52,924
15 Variables
No Missing Values
CustomerID
Transaction_ID
Transaction_Date
Product_SKU
Product_Description
Product_Category
Quantity
Avg_Price
Delivery_Charges
Coupon_Status
Month
Coupon_Code
Discount_pct
GST
Invoice
head(df) & tail(df)head(df)tail(df)describe(df), str(df), &
summary(df)describe(df)## vars n mean sd median
## CustomerID 1 52924 15346.7098103 1766.55602034 15311.0000
## Transaction_ID 2 52924 32409.8256745 8648.66897695 32625.5000
## Transaction_Date* 3 52924 186.8028872 101.72835957 194.0000
## Product_SKU* 4 52924 762.2928728 308.14316731 917.0000
## Product_Description* 5 52924 229.2883191 105.15812454 259.0000
## Product_Category* 6 52924 10.7620550 6.70540005 13.0000
## Quantity 7 52924 4.4976381 20.10471082 1.0000
## Avg_Price 8 52924 52.2376464 64.00688160 16.9900
## Delivery_Charges 9 52924 10.5176304 19.47561323 6.0000
## Coupon_Status* 10 52924 1.8295291 0.90444395 1.0000
## Month* 11 52924 6.3685662 3.47001687 6.0000
## Coupon_Code* 12 52924 28.4509674 13.37335915 33.0000
## Discount_pct 13 52924 19.8023581 8.27887768 20.0000
## GST 14 52924 0.1374618 0.04582478 0.1800
## Invoice 15 52924 101.9831975 172.36572872 45.6362
## trimmed mad min max range
## CustomerID 15360.9325224 2326.19940 12346.0000 18283.000 5937.000
## Transaction_ID 32385.1758621 10188.42720 16679.0000 48497.000 31818.000
## Transaction_Date* 187.6798299 124.53840 1.0000 365.000 364.000
## Product_SKU* 803.1309872 100.81680 1.0000 1145.000 1144.000
## Product_Description* 236.9775154 99.33420 1.0000 404.000 403.000
## Product_Category* 10.7015588 8.89560 1.0000 20.000 19.000
## Quantity 1.5277279 0.00000 1.0000 900.000 899.000
## Avg_Price 42.3079039 21.03809 0.3900 355.740 355.350
## Delivery_Charges 6.9459700 0.00000 0.0000 521.360 521.360
## Coupon_Status* 1.7869154 0.00000 1.0000 3.000 2.000
## Month* 6.3324516 4.44780 1.0000 12.000 11.000
## Coupon_Code* 28.9720123 14.82600 1.0000 46.000 45.000
## Discount_pct 19.8474256 14.82600 0.0000 30.000 30.000
## GST 0.1417222 0.00000 0.0500 0.180 0.130
## Invoice 74.1157519 50.78706 4.6035 8979.275 8974.672
## skew kurtosis se
## CustomerID -0.032637368 -1.2343684 7.678936633
## Transaction_ID 0.005581206 -1.0254602 37.594381536
## Transaction_Date* -0.066141763 -1.0976880 0.442196918
## Product_SKU* -1.001510104 -0.5107442 1.339449091
## Product_Description* -0.513630742 -0.9404206 0.457105558
## Product_Category* -0.082542602 -1.7885356 0.029147302
## Quantity 19.033722579 525.3827622 0.087391964
## Avg_Price 1.632487265 3.3417329 0.278227683
## Delivery_Charges 11.959061423 204.6103981 0.084657378
## Coupon_Status* 0.341717319 -1.6909090 0.003931473
## Month* 0.041208978 -1.2337928 0.015083609
## Coupon_Code* -0.215521921 -1.6877942 0.058131854
## Discount_pct -0.043117471 -1.3546626 0.035986958
## GST -0.351426450 -1.3971437 0.000199193
## Invoice 16.275019561 570.8286389 0.749246270
str(df)## 'data.frame': 52924 obs. of 15 variables:
## $ CustomerID : int 17850 17850 17850 17850 17850 17850 17850 17850 17850 13047 ...
## $ Transaction_ID : int 16679 16680 16681 16682 16682 16682 16682 16682 16682 16682 ...
## $ Transaction_Date : chr "2019-01-01" "2019-01-01" "2019-01-01" "2019-01-01" ...
## $ Product_SKU : chr "GGOENEBJ079499" "GGOENEBJ079499" "GGOEGFKQ020399" "GGOEGAAB010516" ...
## $ Product_Description: chr "Nest Learning Thermostat 3rd Gen-USA - Stainless Steel" "Nest Learning Thermostat 3rd Gen-USA - Stainless Steel" "Google Laptop and Cell Phone Stickers" "Google Men's 100% Cotton Short Sleeve Hero Tee Black" ...
## $ Product_Category : chr "Nest-USA" "Nest-USA" "Office" "Apparel" ...
## $ Quantity : int 1 1 1 5 1 15 15 15 5 52 ...
## $ Avg_Price : num 153.71 153.71 2.05 17.53 16.5 ...
## $ Delivery_Charges : num 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 6.5 ...
## $ Coupon_Status : chr "Used" "Used" "Used" "Not Used" ...
## $ Month : chr "Jan" "Jan" "Jan" "Jan" ...
## $ Coupon_Code : chr "ELEC10" "ELEC10" "OFF10" "SALE10" ...
## $ Discount_pct : num 10 10 10 10 10 10 10 10 10 10 ...
## $ GST : num 0.1 0.1 0.1 0.18 0.18 0.18 0.18 0.18 0.18 0.1 ...
## $ Invoice : num 158.67 158.67 8.53 109.93 24.02 ...
summary(df)## CustomerID Transaction_ID Transaction_Date Product_SKU
## Min. :12346 Min. :16679 Length:52924 Length:52924
## 1st Qu.:13869 1st Qu.:25384 Class :character Class :character
## Median :15311 Median :32626 Mode :character Mode :character
## Mean :15347 Mean :32410
## 3rd Qu.:16996 3rd Qu.:39126
## Max. :18283 Max. :48497
## Product_Description Product_Category Quantity Avg_Price
## Length:52924 Length:52924 Min. : 1.000 Min. : 0.39
## Class :character Class :character 1st Qu.: 1.000 1st Qu.: 5.70
## Mode :character Mode :character Median : 1.000 Median : 16.99
## Mean : 4.498 Mean : 52.24
## 3rd Qu.: 2.000 3rd Qu.:102.13
## Max. :900.000 Max. :355.74
## Delivery_Charges Coupon_Status Month Coupon_Code
## Min. : 0.00 Length:52924 Length:52924 Length:52924
## 1st Qu.: 6.00 Class :character Class :character Class :character
## Median : 6.00 Mode :character Mode :character Mode :character
## Mean : 10.52
## 3rd Qu.: 6.50
## Max. :521.36
## Discount_pct GST Invoice
## Min. : 0.0 Min. :0.0500 Min. : 4.604
## 1st Qu.:10.0 1st Qu.:0.1000 1st Qu.: 20.160
## Median :20.0 Median :0.1800 Median : 45.636
## Mean :19.8 Mean :0.1375 Mean : 101.983
## 3rd Qu.:30.0 3rd Qu.:0.1800 3rd Qu.: 137.400
## Max. :30.0 Max. :0.1800 Max. :8979.275
As the Avg_Price or Invoice amount
increases, the GST (Goods and Services Tax) as a percentage
of the total purchase falls.
With an increase of Quantity comes an increase in
total Invoice amount and
Delivery_Charges.
The larger the Invoice amount, the larger the
Delivery_Charges.
The slope of the linear regression line between
Quantity and Invoice value is 4.37.
# RFM component: Monetary
# Calculate the total transaction amount for each line item in the dataset.
# The formula includes the product of quantity and average price, adds the
# Goods and Services Tax (GST), and also adds any delivery charges associated
# with the transaction.
df$amount <-
df$Quantity * df$Avg_Price * (1 + df$GST) + df$Delivery_Charges
# Aggregate the total monetary value by each customer. This uses the 'aggregate' function to sum up all the
# amounts spent per customer, identified by 'CustomerID'.
# The result is stored in 'rfm.m', which holds the
# total monetary value for each customer.
rfm.m <- aggregate(amount ~ CustomerID, data = df, sum)
# RFM component: Recency
# Identify the latest date in the dataset.
# This is done by accessing the last row's date from the
# 'Transaction_Date' column, assuming the data is ordered by date.
latestdate <-
df$Transaction_Date[nrow(df)] # last date in the dataset
# Calculate the number of days since the latest transaction
# for each transaction in the dataset.
# 'difftime' computes the difference in days between the 'latestdate' and each transaction's date.
df$days_since_latest <-
as.integer(difftime(latestdate, df$Transaction_Date, units = "days"))
# Aggregate the minimum days since last transaction by each custome
# to find out the most recent purchase.
# This uses 'aggregate' to find the minimum 'days_since_latest' for each 'CustomerID', storing the result in 'rfm.r'.
rfm.r <- aggregate(days_since_latest ~ CustomerID, data = df, min)
# RFM component: Frequency
# Calculate the frequency of transactions for each customer. The frequency is determined by counting the
# unique transaction IDs per customer. This involves using the
# 'aggregate' function with a custom function
# that calculates the length of unique transaction IDs.
rfm.f <- aggregate(
Transaction_ID ~ CustomerID,
data = df,
FUN = function(x)
length(unique(x))
)
# Combine RFM
# Combine the results from the monetary, recency, and
# frequency calculations into a single data frame.
# 'cbind' combines the data frames by columns.
rfm <- cbind(rfm.m, rfm.r, rfm.f)
# Convert the combined object into a dataframe if it is not already.
rfm <- as.data.frame(rfm)
# Remove duplicate rows based on all columns, ensuring that each row
# in 'rfm' is unique.
rfm <- rfm[!duplicated(as.list(rfm))]
# Rename the columns to more descriptive names reflecting the RFM components.
names(rfm)[names(rfm) == "Transaction_ID"] <- "frequency"
names(rfm)[names(rfm) == "amount"] <- "monetary"
names(rfm)[names(rfm) == "days_since_latest"] <- "recency"
# Remove redundant datasets
# Clean up the environment by removing temporary data frames
# used in the calculation process.
# This helps free up memory and avoids clutter in the workspace.
rm(rfm.f, rfm.r, rfm.m)## Correlation between RFM R score and Recency: -0.96
| R_Level | F_Level | M_Level | Customers | Average Recency | Average Frequency | Average Monetary |
|---|---|---|---|---|---|---|
| High | High | High | 296 | 43.1047 | 41.3007 | 9,347.5797 |
| High | High | Medium | 17 | 42.1765 | 16.0000 | 2,559.1191 |
| High | Low | High | 2 | 81.5000 | 2.0000 | 6,131.4825 |
| High | Low | Low | 50 | 49.8600 | 1.8600 | 266.7431 |
| High | Low | Medium | 20 | 42.6000 | 2.7500 | 1,019.4593 |
| High | Medium | High | 23 | 34.3478 | 11.7826 | 3,515.0455 |
| High | Medium | Low | 14 | 52.1429 | 5.8571 | 477.5779 |
| High | Medium | Medium | 164 | 46.1402 | 8.6829 | 1,695.1207 |
| Low | High | High | 71 | 305.4366 | 31.5211 | 7,482.2911 |
| Low | High | Medium | 7 | 287.7143 | 17.7143 | 2,576.4558 |
| Low | Low | High | 1 | 268.0000 | 4.0000 | 7,753.2688 |
| Low | Low | Low | 73 | 299.7260 | 1.6301 | 211.3750 |
| Low | Low | Medium | 19 | 300.3684 | 3.0000 | 1,245.1215 |
| Low | Medium | High | 9 | 292.7778 | 12.0000 | 3,781.8340 |
| Low | Medium | Low | 9 | 299.3333 | 4.5556 | 458.2323 |
| Low | Medium | Medium | 105 | 304.8762 | 7.9143 | 1,589.7832 |
| Medium | High | High | 167 | 159.8443 | 34.2635 | 7,151.9321 |
| Medium | High | Medium | 28 | 157.4286 | 17.4643 | 2,511.5858 |
| Medium | Low | Low | 111 | 164.0721 | 1.8468 | 194.0639 |
| Medium | Low | Medium | 18 | 163.0000 | 2.8333 | 1,035.4046 |
| Medium | Medium | High | 17 | 168.8235 | 11.1765 | 4,508.9259 |
| Medium | Medium | Low | 37 | 171.7297 | 5.6486 | 453.6616 |
| Medium | Medium | Medium | 210 | 168.4857 | 8.6524 | 1,526.3793 |
| Group | Avg_Discount |
|---|---|
| Champions (Best customers with highest recency and frequency) | 21.82% |
| Loyal customers (High recency and frequency, but not as recent as Champions) | 19.08% |
| Potential Loyalists (High recency but lower frequency than Loyal customers) | 17.20% |
| New Customers (Highest recency but lowest frequency) | 27.67% |
| Promising (High recency but lowest frequency) | 15.89% |
| Need attention (Medium recency and frequency) | 20.84% |
| About to sleep (Medium recency but low frequency) | 21.25% |
| At risk (Low recency and medium frequency) | 19.56% |
| Can’t lose them (Low recency but highest frequency) | 16.98% |
| Hibernating (Low recency and low frequency) | 18.33% |
## The table above shows the average discount percentage for each customer segment based on their recency and frequency scores.
## Some key observations:
## - Champions and Loyal customers, who have high recency and frequency, tend to have higher average discounts compared to other segments.
## - New Customers and Promising segments, despite high recency, have lower average discounts due to their low frequency.
## - Hibernating customers, with low recency and frequency, have the lowest average discounts among all segments.
During the course of this RFM segmentation project, several empirical challenges were encountered. These challenges highlight the complexities and considerations involved in working with real-world customer data. Some of the key challenges faced include:
One of the primary challenges was ensuring the quality and integrity of the customer transaction data. The dataset required thorough cleaning, preprocessing, and validation to handle missing values, outliers, and inconsistencies. Ensuring accurate and reliable data was crucial for the validity of the RFM analysis and segmentation results.
The e-commerce dataset contained a substantial number of customer records and transactions. Processing and analyzing large volumes of data presented computational challenges. Efficient data manipulation techniques and optimized algorithms were necessary to handle the dataset effectively and generate timely insights.
Determining the appropriate definitions and thresholds for the Recency, Frequency, and Monetary metrics was a critical challenge. The selection of suitable time frames, such as considering the most recent transaction date or the entire transaction history, required careful consideration. Setting appropriate thresholds for categorizing customers into different RFM segments also involved iterative experimentation and domain expertise.
The distributions of the RFM metrics, particularly the Frequency and Monetary values, exhibited significant skewness. Dealing with skewed data required applying appropriate transformations, such as logarithmic scaling, to achieve more normalized distributions. This ensured that the RFM scores and segments accurately reflected the underlying customer behavior patterns.
The presence of outliers in the dataset, such as customers with exceptionally high or low RFM values, posed challenges in the segmentation process. Outliers can potentially distort the RFM scores and segment assignments. Identifying and handling outliers required careful consideration, such as setting appropriate thresholds or applying robust statistical techniques to mitigate their impact.
Determining the optimal number of customer segments was another empirical challenge. While the RFM framework suggests using quintiles, the specific number of segments may vary based on the business context and data characteristics. Balancing segment granularity and interpretability required iterative experimentation and evaluation of different segmentation approaches.
Validating the RFM segmentation results and interpreting the characteristics of each segment posed challenges. Assessing the stability and robustness of the segments across different time periods or data subsets was important to ensure the reliability of the insights. Interpreting the segment characteristics and translating them into actionable business strategies required collaboration with domain experts and stakeholders.
Addressing these empirical challenges required a combination of data preprocessing techniques, statistical methods, and domain knowledge. Iterative refinement, sensitivity analysis, and collaborative discussions with business stakeholders were essential to overcome these challenges and derive meaningful insights from the RFM segmentation analysis.
Offer exclusive discounts or incentives.
Personalize marketing communications and product recommendations.
Consider subscription options or VIP benefits.
Create compelling promotions like BOGO deals or bundle pricing.
Develop targeted content to restore interest.
Use retargeting ads and personalized messaging.
Focus on brand awareness and re-engagement.
Encourage social media interactions.
Consider reactivation campaigns with incentives.
Predictive Analytics
Forecast customer behavior.
Proactively target customers with personalized offers.
Predict likelihood to purchase, churn probability, or expected lifetime value.
Sentiment Analysis
Analyze customer reviews, feedback, and social media interactions.
Gain insights into customer satisfaction, preferences, and pain points.
Tailor marketing messages and improve product offerings.
Customer Demographics
Incorporate age, gender, location, income level, or education data.
Create comprehensive customer profiles.
Personalize marketing campaigns and product recommendations.
Social Media Interactions
Integrate likes, shares, comments, or mentions data.
Understand customer engagement and brand perception.
Identify brand advocates and influencer potential.
Key Performance Indicators (KPIs)
Establish clear KPIs for each RFM segment.
Monitor response rates, conversion rates, average order value, customer lifetime value, retention rates, or net promoter score.
Assess the effectiveness of marketing initiatives.
Dashboards
Develop interactive dashboards for each RFM segment.
Display key metrics, trends, and comparisons across segments.
Facilitate data-driven decision making.
Regular Updates
Update RFM segmentation model with the latest customer transaction data.
Adapt segmentation and marketing strategies to evolving customer preferences and market conditions.
A/B Testing
Evaluate the effectiveness of different marketing approaches for each segment.
Test variations in messaging, offers, channels, or creative elements.
Identify the most impactful strategies for each segment.
Customer Feedback
Actively seek and incorporate customer feedback.
Conduct surveys, focus groups, or customer interviews.
Gather qualitative insights into customer preferences, expectations, and experiences.
Refine segment definitions, identify new opportunities, and address gaps.
By integrating RFM segmentation with other models, enriching data, establishing performance metrics, and continuously refining the approach, businesses can create a comprehensive and dynamic customer segmentation strategy. This holistic approach enables companies to effectively target, engage, and retain customers, ultimately driving business growth and profitability in the ever-evolving e-commerce landscape.